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Linux package managers have to deal with dependencies and conflicts of packages required to be 
installed by the user As an NP-complete problem, this is a hard task to solve. In this context, 
several approaches have been pursued. Apt-pbo is a package manager based on the apt project that 
encodes the dependency solving problem as a pseudo-Boolean optimization (PBO) problem. This 
paper compares different PBO solvers and their effectiveness on solving the dependency solving 
problem. 



1 Introduction 

Software installation is the process of installing programs assuring that specifically required software is 
pre-installed and that defined actions are taken before or after the copy of the files into the file-system 
Il22ll24l . Although this is a common problem among Microsoft and Open Source Operating Systems 
(GNU/Linux, BSD,...) E5l we will focus on the later ones, since a progress in this field would be 
applicable to all environments, including applications like Eclipse or Firefox 1 15 1. 

The installation process comprises retrieving the package, solving the software dependency tree, 
retrieving and installing the software dependencies and finally installing the package and executing the 
associated install scripts ||8l. 

The dependency graph represents the software dependencies and sub-dependencies needed for a 
package to work properly after installation [5|. The restrictions imposed by the graph may have no 
solution (for instance, due to broken dependencies), only one solution, or several solutions. Criteria 
such as the minimum number of packages or freshness can be defined to rank the solutions in terms of 
their quality. Finding a solution consists in defining the sub-set of packages that meets the dependency 
requirements. This process is called dependency solving. One approach to dependency solving is to 
encode the problem as a pseudo-Boolean optimization (PBO) problem using existing solvers for finding 
the optimal solutions. This approach is applied in apt-pbo, a meta-installer tool based on apt that will be 
described in this paper. 

This paper is organized as follows. In section |2] we provide background information about PBO. 
Section|3]depicts the apt-pbo tool and its architecture. Section[4]presents empirical results of experiments 
conducted with the several solvers. Finally, in section |6] are presented the concluding remarks. 

2 Background 

Pseudo-Boolean Optimization (PBO) is a special case of Integer Linear Programming (ILP) where vari- 
ables are Boolean. For this reason, it is often called 0-1 ILP. This is the case of our package selection 
problem, where a package being present in the final solution can be easily encoded as a Boolean variable 
being assigned value or 1 . 
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Pseudo-Boolean functions are a generalization of Boolean functions with a mapping ^" = {0, 1} i— t- 
M im [71. Pseudo-Boolean functions in polynomial form are widely used in optimization models in 
different areas like statistics, computer science, VLSI design and operations research. 

A PBO problem can be formally defined as follows 121: 



minimize V Cj • xj 

jeN 

subject to y\ ajjlj > bj 

jeN 



(1) 



Xj £{0,\},aij,bi,Cj £N^,i£M 



M=l, 
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where each cj is a non-negative integer cost associated with variable xj, j € N and ajj denotes the 
coefficients of the literals Ij in the set of m linear constraints, being a literal a Boolean variable or its 
negation. 

Recent algorithms for solving the PBO problem integrate features from recent advances in Boolean 
satisfiability (SAT) and classical branch and bound algorithms. 



3 System Overview 

3.1 Architecture 

Apt is a meta-installer widely used in Linux distributions. However, apt solves dependencies in a very 
straightforward way and in a large number of occurrences fails to deliver a solution. 

The Apt-pbo application ll23l belongs to a new generation of meta-installers that not only are capable 
of finding a solution but are flexible to allow the user to customize which solution fits best the needs. 

The architecture of apt-pbo has different hooks to integrate modules. This architecture allows ex- 
change of modules. For example, changing the PBO solver being used is an extremely easy task. 

In our tests, the overhead of the external calls is not significant since the number of iterations is 
extremely low. 

Figure[T]depicts a typical installation flow of apt-pbo. 
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Figure 1 : High level processing flow of apt-pbo 

The apt-pbo application is called with the operation install and the desired package as arguments, 
which map the usage of apt-get. 

The components of the figure have the following role: 
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• apt-get pbo-install: we have a modified version of apt-get installation software. Apt is one of the 
most used meta-installers and is adopted by Linux distributions like Debian, Ubuntu and Caixa 
Magica. The modifications introduced by the author created a new method called pbo-install 
which, given a specific package, calculates the dependency tree and writes the PBO encoding. The 
PBO encoding is composed of a set of PB -constraints and an objective function. 

• PBO solver: the problem.pbo formula is solved by the PBO solver. We have used and tested 
different solvers as will be detailed in section |4l 

• parsing solution: apt-pbo has a module that parses the solver solution and, if necessary, estab- 
lishes a new iteration with apt-get pbo-install. 

• apt-get install solution: when the final package set solution is reached, the user is asked for 
permission and the removal and installation of packages are performed using apt and dpkg / rpm. 

3.2 PBO encoding 

As presented in the previous section, apt-pbo pbo-install encodes the the problem as a Pseudo-Boolen 

Optimization. 

This encoding has two parts: constraint and objective function definition- 
Constraints definition 
In a pseudo-Boolean formula, variables have Boolean domains and constraints are linear inequalities 

with integer coefficients. 

Encoding relations of the dependency tree as constraints is a straightforward task. The following 

translations will be used: 

• Installation: /?i[jis the package that we want to install: pi > 1. 

• Dependency: p\ depends on x\ should be represented as xi— p\> 0. This means that installing p\ 
implies installing x\ as well, although x\ may be installed without py. If p\ also depends oiyi, we 
should add x\— p^ > 0. 

• Multiple versions: if a package p\ requires the installation of a package x having different ver- 
sions, for example x\ and X2, then we should encode the requirement that installing package p^ 
requires installing either package x\ or package xi- Hence, such requirement may be encoded with 
constraint xi +X2 — p\ > 0. 

• Conflicts: if a package has an explicit conflict with other package, for instance ifyj, conflicts with 
xi, then this conflict is encoded as x\+yj, < 1. Remember that there is a conflict for each pair of 
different packages corresponding to the same unit. 

Objective Function Definition 

In the Objective Function we define what we plan to minimize. 

Two approaches might be followed: we minimize a single criterion (e.g. the number of packages) ou 
multiple-criteria. 

We will start by presenting single criterion. 

Minimizing Package Removal 

To minimize the number of removed packages, even if newer packages exist, one should use the 
following objective function, where PI[..PI'i^ is the set of packages already installed: 



For simplicity, the tuple representation of a package as [p, 2) will be now represented as p2 
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/i(P) = min(l-W;) + ... + (l-P4) 

In order to minimize the objective function, the solver will try to set variable Pit to 1 which will 
imply not removing installed applications. 

Minimizing the Number of Installed Packages 

In this case, the total number of packages installed in the system is to be minimized. Having PI..PN 
as the new packages targeted to be installed - either existent or new - the objective function will be: 



f2{P)=minPl +... +PN 

Maximizing the Freshness of Packages 

Consider PI 1 ..Plki to be different software versions or releases of package PI . Also, consider v(Pl i ) 
to be the normalized distance (a constant, for the purposes of the PBO problem) between the package 
PIi and the newest version present in repository R. Then the optimization function is: 



/3(P)=mm(Pli*v(Pli) + ...+Pljfi*v(Pl^i)) + 
{PZi*v{PZi) + ...+PZkn*v{PZkn))- 

The value of v{PiKi) is zero if the package is the newest in the repository. 

Multicriteria optimization 

However, in the real world installing a package follows multiple criteria and even if one is more 
important than the others that can lead to non-desired solutions. 

Trying to satisfy different criteria when finding the set of packages for a software installation falls in 
the multicriteria decision making (MCDM) set of problems lITTl . 

Apt-pbo integrates the different objective functions of the previous section as a multiobjective prob- 
lem (MOP): 



mm(/i(P),/2(P),/3(P)) 

with P as the available packages and /i ,/2 and /3 as the existent objective functions. 

The multiobjective problem is solved transforming it into a single objective problem through weighted 
sum scalarization. 

Apt-pbo uses the following coefficients. A, representing the overall utility for the user: Removal 
Cost - Wr (weight given to the cost of a removal of a package). Presence Cost - Wp (weight given to the 
presence of a new or an already installed package) and Version Cost - Wy (weight representing the cost 
of having an older version in the solution when a newer exists). 

The objective function is then defined as: 



min {Wr-fi{P)+Wp-f2{P)+ Wy -MP)) 
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4 Experimental Results 

We performed experiments on a large set of different repositories, packages and systems hosted at 02H 
Lab cluster of 164 Xeon CPU coreq^with Linux installed in Xen virtual system machines and inside a 
chroot environment. In what follows we report the results of this evaluation. 

The goal of the experiments performed was to simulate the installation of software in a Linux envi- 
ronment and test the different PBO solvers against the same criteria. 

A comparison of SAT and PBO solvers has been performed extensively through international com- 
petitions and benchmarks ||3j|T8j|9l. Since the solving algorithm can benefit greatly from the structure of 
the problem, it was considered important to evaluate different PBO solvers on solving this problem. As 
mentioned in section [3j apt-pbo is structured in a modular form, thus allowing the replacement of one 
PBO solver by another compatible solver. 

For testing purposes, four solvers were considered: 



• minisat+ llTOl : from the same authors of minisat, a well known SAT solver, and actually based on 
minisat, minisat+ encodes PB -constraints into SAT. 



• 



• 



bsolo |[T2ll : bsolo is a PBO solver, which was first designed to solve instances of the Unate and 
Binate Covering Problems (UCP/BCP) and later updated with pseudo-Boolean constraints support. 

wbo lITTl : from some of the same authors of bsolo, participated in the PB'09 competition. 

opbdp [2]: an implementation in C-i~i- of an implicit enumeration algorithm for solving PBO. 



Besides the solvers mentioned above, Pueblo Ell was also considered but not included since the only 
available version is dynamically linked and the libraries needed are old and not available in the testing 
infra- structure. Nevertheless, an old Linux system was installed (Debian Etch) and some ad-hoc tests 
were performed with Pueblo. These tests revealed that Pueblo has a poor performance for this specific 
type of problems and no further efforts to port Pueblo were made. 

The tests consisted of 1 ,000 installation of packages over a Debian Lenny Linux system. Two differ- 
ent scenarios were tested: "conservative" and "aggressive". 



The weights in the objective function (section 3.2 1 are the same in both scenarios adopting a balanced 
configuration between updates and removals. 

The difference are trhe active repositories. In the "conservative" scenario only Lenny repositories 
were active (main and updates). In the "aggressive" the Sid (development version) and Backports repos- 
itories were also present. Table [T] summarizes the differences between scenarios. In fact, 12,000 more 
packages were present in the "aggressive" scenario and more than the double of the total space accounted 
by apt-pbo for mapping packages, dependencies and conflicts. 

4.1 Aggressive scenario 

Table |2] summarizes the results of the evaluation performed in the context of the aggressive scenario. 

As we can observe, both wbo and bsolo are able to solve all the instances but wbo has a better 
performance (4.45 seconds on average per transaction). Minisat+ comes in third place, not only with a 
lower number of instances solved, 355, but also with a poorer performance, taking on average more than 
two minutes to solve a problem, wbo has also a smaller standard deviation than bsolo. The average time 
consists in the time, in average, per installation transaction. 



The infra- structure is integrated in tlie ADETTI / ISCTE centre of RNG Grid. 
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Table 1 : Characterization of packages - conservative and aggressive scenarios 



Measures 


Conservative 


Agressive 


Total package names 


30014 


42007 


Total distinct versions 


24100 


51337 


Total dependencies 


147085 


326891 


Total Provides mappings 


5146 


10962 


Total dependency version space 


602k 


1358k 


Total space accounted for 


7284k 


14,9M 



Table 2: PBO solvers benchmarking - A 


ggressive scenario 




bsolo 


who 


minisat+ 


opbdp 


# Solved 


1,000 


1,000 


355 


47 


# Timeouts 








645 


953 


Average time 


00:07.79 


00:04.45 


02:30.16 


07:16.49 


Standard deviation 


00:02.83 


00:01.19 


01:29.33 


35:13.02 



Figure|2]compares who and bsolo varying the number of the installed packages per transaction. There 
is a smooth growth by who and a more unstable line of growth in a much more unpredictable fashion 
by bsolo. Since minisat+ and opbdp had a significant number of timeouts, they were not included in the 
graph. 



4.2 Conservative scenario 

In the conservative scenario, development repositories are not active and therefore there is a much more 
steady environment for dependency solving. 

In this case, the four solvers were able to find the solutions before the timeout of 150 seconds. In 
fact, on average they performed under 3 seconds with the exception of minisat+. 

Table 3: PBO solvers benchmarking - Conservative scenario 





who 


bsolo 


minisat+ 


opbdp 


# Solved 


1000 


1000 


1000 


1000 


# Timeouts 














Average time 


00:02.6 


00:02.62 


00:06.22 


00:02.55 


Standard deviation 


00:00.8 


00:01.1 


00:01.4 


00:01.1 



Figure [3] depicts the size of the problem vs time. Although on average opbdp performs better than 
minisat+, the figure shows that as the size of the problem grows opbdp is more sensible to peaks and 
outliers. 
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Figure 2: PBO solvers graph - Aggressive scenario 
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Figure 3: PBO solvers graph - Conservative scenario 
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5 Related Work 

The use of Boolean Satisfiability (SAT) i^j for solving the dependency problem has first been proposed 
in the context of the EDOS FP6 project lfT6l |6l which had impact in other research efforts |[T3]| . An 
alternative formulation using constraint programming techniques has been described in ll20l . including 
the use of different heuristics for improving the quality of the solution found. 

6 Conclusions 

The PBO solvers evaluated follow different theoretical approaches and therefore are expected to have 
different results. However, some results of the tests performed are interesting to recall: wbo is the solver 
that performed better in both scenarios and with a more stable behaviour, bsolo has also interesting 
results in both scenarios. 

Although wbo is the solver with better time results, there are other aspects to take in account: min- 
isat+ is open source and can be enhanced to address more difficult problems as the presented ones in the 
aggressive scenario. Being open source is a critical point to a Linux distribution that might adopt such a 
tool. 

Future work will consist in analysing, jointly with the authors of the PBO tools, possible enhance- 
ments of the tools as a result of this evaluation. Another direction for future work is to study the possi- 
bility of the solvers returning a non-optimal solution when the timeout is reached. 

Finally, this article can be extended to study other solvers such as SCIP [ 1 1 and boolean optimization 
engines such as SAT4JPB yjj or MsUnCore Gil. 
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